NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety

Guan, Zihan; Hu, Mengxuan; Zhu, Ronghang; Li, Sheng; Vullikanti, Anil (July 2025, The Forty-Second International Conference on Machine Learning)

Recent studies have uncovered a troubling vulnerability in the fine-tuning stage of large language models (LLMs): even fine-tuning on entirely benign datasets can lead to a significant increase in the harmfulness of LLM outputs. Building on this finding, our red teaming study takes this threat one step further by developing a more effective attack. Specifically, we analyze and identify samples within benign datasets that contribute most to safety degradation, then fine-tune LLMs exclusively on these samples. We approach this problem from an outlier detection perspective and propose Self-Inf-N, to detect and extract outliers for fine-tuning. Our findings reveal that fine-tuning LLMs on 100 outlier samples selected by Self-Inf-N in the benign datasets severely compromises LLM safety alignment. Extensive experiments across seven mainstream LLMs demonstrate that our attack exhibits high transferability across different architectures and remains effective in practical scenarios. Alarmingly, our results indicate that most existing mitigation strategies fail to defend against this attack, underscoring the urgent need for more robust alignment safeguards.
more » « less
Free, publicly-accessible full text available July 14, 2026
Revisiting Source-Free Domain Adaptation: Insights into Representativeness, Generalization, and Variety

Zhu, Ronghang; Hu, Mengxuan; Zhuang, Weiming; Lyu, Lingjuan; Yu, Xiang; Li, Sheng (June 2025, The IEEE/CVF Conference on Computer Vision and Pattern Recognition)

Domain adaptation addresses the challenge where the distribution of target inference data differs from that of the source training data. Recently, data privacy has become a significant constraint, limiting access to the source domain. To mitigate this issue, Source-Free Domain Adaptation (SFDA) methods bypass source domain data by generating source-like data or pseudo-labeling the unlabeled target domain. However, these approaches often lack theoretical grounding. In this work, we provide a theoretical analysis of the SFDA problem, focusing on the general empirical risk of the unlabeled target domain. Our analysis offers a comprehensive understanding of how representativeness, generalization, and variety contribute to controlling the upper bound of target domain empirical risk in SFDA settings. We further explore how to balance this trade-off from three perspectives: sample selection, semantic domain alignment, and a progressive learning framework. These insights inform the design of novel algorithms. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on three benchmark datasets--Office-Home, DomainNet, and VisDA-C--yielding relative improvements of 3.2%, 9.1%, and 7.5%, respectively, over the representative SFDA method, SHOT.
more » « less
Free, publicly-accessible full text available June 11, 2026
Open-Set Graph Domain Adaptation via Separate Domain Alignment

https://doi.org/10.1609/aaai.v38i8.28765

Wang, Yu; Zhu, Ronghang; Ji, Pengsheng; Li, Sheng (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

Domain adaptation has become an attractive learning paradigm, as it can leverage source domains with rich labels to deal with classification tasks in an unlabeled target domain. A few recent studies develop domain adaptation approaches for graph-structured data. In the case of node classification task, current domain adaptation methods only focus on the closed-set setting, where source and target domains share the same label space. A more practical assumption is that the target domain may contain new classes that are not included in the source domain. Therefore, in this paper, we introduce a novel and challenging problem for graphs, i.e., open-set domain adaptive node classification, and propose a new approach to solve it. Specifically, we develop an algorithm for efficient knowledge transfer from a labeled source graph to an unlabeled target graph under a separate domain alignment (SDA) strategy, in order to learn discriminative feature representations for the target graph. Our goal is to not only correctly classify target nodes into the known classes, but also classify unseen types of nodes into an unknown class. Experimental results on real-world datasets show that our method outperforms existing methods on graph domain adaptation.
more » « less
Full Text Available
Unsupervised Class-imbalanced Domain Adaptation with Pairwise Adversarial Training and Semantic Alignment

https://doi.org/10.1109/TCSVT.2024.3427428

Shi, Weili; Zhu, Ronghang; Li, Sheng (January 2024, IEEE Transactions on Circuits and Systems for Video Technology)

Full Text Available
Semi-Supervised Single Domain Generalization with Label-Free Adversarial Data Augmentation

Zhu, Ronghang; Yu, Xiang; Li, Sheng (November 2023, Transactions on Machine Learning Research (TMLR))

Full Text Available

Search for: All records